AITopics | svm 0

Collaborating Authors

svm 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

170035f97007fdfa665880107b56f384-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-8-2026, 07:46:43 GMT

flavor svm 0, only svm 0, svm 0, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)

Genre: Research Report (0.46)

Industry:

Law (1.00)
Health & Medicine (1.00)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (1.00)
Information Technology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Murmur2Vec: A Hashing Based Solution For Embedding Generation Of COVID-19 Spike Sequences

Ali, Sarwan, Murad, Taslim

arXiv.org Artificial IntelligenceDec-12-2025

Early detection and characterization of coronavirus disease (COVID-19), caused by SARS-CoV-2, remain critical for effective clinical response and public-health planning. The global availability of large-scale viral sequence data presents significant opportunities for computational analysis; however, existing approaches face notable limitations. Phylogenetic tree-based methods are computationally intensive and do not scale efficiently to today's multi-million-sequence datasets. Similarly, current embedding-based techniques often rely on aligned sequences or exhibit suboptimal predictive performance and high runtime costs, creating barriers to practical large-scale analysis. In this study, we focus on the most prevalent SARS-CoV-2 lineages associated with the spike protein region and introduce a scalable embedding method that leverages hashing to generate compact, low-dimensional representations of spike sequences. These embeddings are subsequently used to train a variety of machine learning models for supervised lineage classification. We conduct an extensive evaluation comparing our approach with multiple baseline and state-of-the-art biological sequence embedding methods across diverse metrics. Our results demonstrate that the proposed embeddings offer substantial improvements in efficiency, achieving up to 86.4\% classification accuracy while reducing embedding generation time by as much as 99.81\%. This highlights the method's potential as a fast, effective, and scalable solution for large-scale viral sequence analysis.

artificial intelligence, bioinformatics, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2512.10147

Country:

North America > United States (0.46)
Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)

Technology:

Information Technology > Biomedical Informatics > Translational Bioinformatics (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Empirical Risk Minimization Under Fairness Constraints

Michele Donini, Luca Oneto, Shai Ben-David, John S. Shawe-Taylor, Massimiliano Pontil

Neural Information Processing SystemsNov-20-2025, 17:53:45 GMT

We address the problem of algorithmic fairness: ensuring that sensitive information does not unfairly influence the outcome of a classifier. We present an approach based on empirical risk minimization, which incorporates a fairness constraint into the learning problem. It encourages the conditional risk of the learned classifier to be approximately constant with respect to the sensitive variable. We derive both risk and fairness bounds that support the statistical consistency of our methodology. We specify our approach to kernel methods and observe that the fairness requirement implies an orthogonality constraint which can be easily added to these methods. We further observe that for linear models the constraint translates into a simple data preprocessing step. Experiments indicate that the method is empirically effective and performs favorably against state-of-the-art approaches.

artificial intelligence, fairness, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
South America > Chile (0.04)
North America > Canada > Quebec > Montreal (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

Breaking the Euclidean Barrier: Hyperboloid-Based Biological Sequence Analysis

Ali, Sarwan, Mansoor, Haris, Patterson, Murray

arXiv.org Artificial IntelligenceOct-2-2025

Genomic sequence analysis plays a crucial role in various scientific and medical domains. Traditional machine-learning approaches often struggle to capture the complex relationships and hierarchical structures of sequence data when working in high-dimensional Euclidean spaces. This limitation hinders accurate sequence classification and similarity measurement. To address these challenges, this research proposes a method to transform the feature representation of biological sequences into the hyperboloid space. By applying a transformation, the sequences are mapped onto the hyperboloid, preserving their inherent structural information. Once the sequences are represented in the hyperboloid space, a kernel matrix is computed based on the hyperboloid features. The kernel matrix captures the pairwise similarities between sequences, enabling more effective analysis of biological sequence relationships. This approach leverages the inner product of the hyperboloid feature vectors to measure the similarity between pairs of sequences. The experimental evaluation of the proposed approach demonstrates its efficacy in capturing important sequence correlations and improving classification accuracy.

artificial intelligence, bioinformatics, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.01118

Country: Asia (0.67)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.70)

Technology:

Information Technology > Biomedical Informatics (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

Protected Probabilistic Classification Library

Petej, Ivan

arXiv.org Artificial IntelligenceSep-16-2025

This paper introduces a new Python package specifically designed to address calibration of probabilistic classifiers under dataset shift. The method is demonstrated in binary and multi-class settings and its effectiveness is measured against a number of existing post-hoc calibration methods. The empirical results are promising and suggest that our technique can be helpful in a variety of settings for batch and online learning classification problems where the underlying data distribution changes between the training and test sets.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2509.11267

Genre: Research Report > New Finding (0.46)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Categorical Classification of Book Summaries Using Word Embedding Techniques

Keskin, Kerem, Keleş, Mümine Kaya

arXiv.org Artificial IntelligenceJul-30-2025

In this study, book summaries and categories taken from book sites were classified using word embedding methods, natural language processing techniques and machine learning algorithms. In addition, one hot encoding, Word2Vec and Term Frequency - Inverse Document Frequency (TF - IDF) methods, which are frequently used word embedding methods were used in this study and their success was compared. Additionally, the combination table of the pre - processing methods used is shown and added to the table. Looking at the results, it was observed that Support Vector Machine, Naive Bayes and Logistic Regression Models and TF - IDF and One - Hot Encoder word embedding techniques gave more successful results for Turkish texts. Using word2vec to process big text data.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2507.21058

Country:

Europe > Kosovo (0.16)
Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report > New Finding (0.74)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)

Add feedback

The Impact of Feature Scaling In Machine Learning: Effects on Regression and Classification Tasks

Pinheiro, João Manoel Herrera, de Oliveira, Suzana Vilas Boas, Silva, Thiago Henrique Segreto, Saraiva, Pedro Antonio Rabelo, de Souza, Enzo Ferreira, Godoy, Ricardo V., Ambrosio, Leonardo André, Becker, Marcelo

arXiv.org Machine LearningJul-24-2025

This research addresses the critical lack of comprehensive studies on feature scaling by systematically evaluating 12 scaling techniques - including several less common transformations - across 14 different Machine Learning algorithms and 16 datasets for classification and regression tasks. We meticulously analyzed impacts on predictive performance (using metrics such as accuracy, MAE, MSE, and $R^2$) and computational costs (training time, inference time, and memory usage). Key findings reveal that while ensemble methods (such as Random Forest and gradient boosting models like XGBoost, CatBoost and LightGBM) demonstrate robust performance largely independent of scaling, other widely used models such as Logistic Regression, SVMs, TabNet, and MLPs show significant performance variations highly dependent on the chosen scaler. This extensive empirical analysis, with all source code, experimental results, and model parameters made publicly available to ensure complete transparency and reproducibility, offers model-specific crucial guidance to practitioners on the need for an optimal selection of feature scaling techniques.

artificial intelligence, lgbm 0, machine learning, (17 more...)

arXiv.org Machine Learning

2506.08274

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
(2 more...)

Add feedback

Explaining Concept Shift with Interpretable Feature Attribution

Lyu, Ruiqi, Turcan, Alistair, Wilder, Bryan

arXiv.org Machine LearningMay-28-2025

Regardless the amount of data a machine learning (ML) model is trained on, there will inevitably be data that differs from their training set, lowering model performance. Concept shift occurs when the distribution of labels conditioned on the features changes, making even a well-tuned ML model to have learned a fundamentally incorrect representation. Identifying these shifted features provides unique insight into how one dataset differs from another, considering the difference may be across a scientifically relevant dimension, such as time, disease status, population, etc. In this paper, we propose SGShift, a model for detecting concept shift in tabular data and attributing reduced model performance to a sparse set of shifted features. SGShift models concept shift with a Generalized Additive Model (GAM) and performs subsequent feature selection to identify shifted features. We propose further extensions of SGShift by incorporating knockoffs to control false discoveries and an absorption term to account for models with poor fit to the data. We conduct extensive experiments in synthetic and real data across various ML models and find SGShift can identify shifted features with AUC $>0.9$ and recall $>90\%$, often 2 or 3 times as high as baseline methods.

artificial intelligence, dataset, machine learning, (16 more...)

arXiv.org Machine Learning

2505.20634

Country:

North America > United States > California (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.93)
Health & Medicine > Therapeutic Area > Immunology (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Add feedback

Predicting BVD Re-emergence in Irish Cattle From Highly Imbalanced Herd-Level Data Using Machine Learning Algorithms

Mimnagh, Niamh, Parnell, Andrew, McAloon, Conor, Carlson, Jaden, Guelbenzu, Maria, Brock, Jonas, Barrett, Damien, McGrath, Guy, Tratalos, Jamie, Moral, Rafael

arXiv.org Artificial IntelligenceApr-18-2025

Bovine Viral Diarrhoea (BVD) has been the focus of a successful eradication programme in Ireland, with the herd-level prevalence declining from 11.3% in 2013 to just 0.2% in 2023. As the country moves toward BVD freedom, the development of predictive models for targeted surveillance becomes increasingly important to mitigate the risk of disease re-emergence. In this study, we evaluate the performance of a range of machine learning algorithms, including binary classification and anomaly detection techniques, for predicting BVD-positive herds using highly imbalanced herd-level data. We conduct an extensive simulation study to assess model performance across varying sample sizes and class imbalance ratios, incorporating resampling, class weighting, and appropriate evaluation metrics (sensitivity, positive predictive value, F1-score and AUC values). Random forests and XGBoost models consistently outperformed other methods, with the random forest model achieving the highest sensitivity and AUC across scenarios, including real-world prediction of 2023 herd status, correctly identifying 219 of 250 positive herds while halving the number of herds that require compared to a blanket-testing strategy.

artificial intelligence, data mining, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2504.13116

Country: Europe > Austria (0.28)

Genre: Research Report > Experimental Study (0.36)

Industry: Health & Medicine > Therapeutic Area (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.71)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.69)

Add feedback